Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer

نویسندگان

  • Maria Eleftheriou
  • Blake G. Fitch
  • Aleksandr Rayshubskiy
  • T. J. Christopher Ward
  • Robert S. Germain
چکیده

This paper presents performance characteristics of a communicationsintensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128×128×128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128×128×128 complex FFT on 2048 nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements

This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Genet/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware br...

متن کامل

Vectorization techniques for the Blue Gene/L double FPU

This paper presents vectorization techniques tailored to meet the specifics of the two-way single-instruction multiple-data (SIMD) double-precision floating-point unit (FPU), which is a core element of the node application-specific integrated circuit (ASIC) chips of the IBM 360-teraflops Blue Genet/L supercomputer. This paper focuses on the general-purpose basic-block vectorization and optimiza...

متن کامل

Performance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer

QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Tw...

متن کامل

Task placement of parallel multi-dimensional FFTs on a mesh communication network

For many scientific applications, the Fast Fourier Transformation (FFT) of multi-dimensional data is the kernel which limits scalability to large numbers of processors. This paper investigates an extension of a traditional parallel threedimensional FFT (3D-FFT) implementation. The extension within a parallel 3D-FFT consists of customized MPI task mappings between the virtual processor grid of t...

متن کامل

FFT specific compilation on IBM blue gene

Bei vielen numerischen Codes gelingt es verfügbaren Compilern nicht, das Leistungspotential moderner Prozessoren zufriedenstellend auszuschöpfen. Als Alternative zum Hand-Coding und -Tuning von numerischen Grundroutinen wurde der MAP Special-Purpose-Compiler entwickelt und speziell an die Anforderungen von Codes aus der Domäne der Signalverarbeitung angepaßt. Die neue, an IBM Blue Gene Supercom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005